Active Learning for Building a Corpus of Questions for Parsing

نویسندگان

  • Jordi Atserias Batalla
  • Giuseppe Attardi
  • Maria Simi
  • Hugo Zaragoza
چکیده

This paper describes how we built a dependency Treebank for questions. The questions for the Treebank were drawn from questions from the TREC 10 QA task and from Yahoo! Answers. Among the uses for the corpus is to train a dependency parser achieving good accuracy on parsing questions without hurting its overall accuracy. We also explore active learning techniques to determine the suitable size for a corpus of questions in order to achieve adequate accuracy while minimizing the annotation efforts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Active participation of student s in teaching

Active participation of students in teaching is the one of the effective way of learning in science education according to large investigation. By this way students understand current level of knowledge, increasing the student’s eagerness and their attraction for learning. This study was designed to explore the effect of active student involvement in teaching and learning of the bacteriology in...

متن کامل

Domain Adaptation by Active Learning

We tackled the Evalita 2011 Domain Adaptation task with a strategy of active learning. The DeSR parser can be configured to provide different measures of perplexity in its own ability to parse sentences correctly. After parsing sentences in the target domain, a small number of the sentences with the highest perplexity were selected, revised manually and added to the training corpus in order to ...

متن کامل

Active Learning for Statistical Natural Language Parsing

It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming. This paper presents a method to reduce this demand using active learning, which selects what samples to annotate, instead of annotating blindly the whole training corpus. Sample selection for annotation is based upon “representativeness” and “usefulness”. ...

متن کامل

Sample Selection for Statistical Parsing

Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a help...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010